Picture for Haoqi Fan

Haoqi Fan

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Add code
May 29, 2026
Viaarxiv icon

Context Unrolling in Omni Models

Add code
Apr 23, 2026
Viaarxiv icon

Continuous Adversarial Flow Models

Add code
Apr 13, 2026
Viaarxiv icon

LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

Add code
Oct 27, 2025
Figure 1 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 2 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 3 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Figure 4 for LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Viaarxiv icon

Emerging Properties in Unified Multimodal Pretraining

Add code
May 20, 2025
Figure 1 for Emerging Properties in Unified Multimodal Pretraining
Figure 2 for Emerging Properties in Unified Multimodal Pretraining
Figure 3 for Emerging Properties in Unified Multimodal Pretraining
Figure 4 for Emerging Properties in Unified Multimodal Pretraining
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Figure 1 for Seed1.5-VL Technical Report
Figure 2 for Seed1.5-VL Technical Report
Figure 3 for Seed1.5-VL Technical Report
Figure 4 for Seed1.5-VL Technical Report
Viaarxiv icon

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning

Add code
Mar 10, 2025
Figure 1 for Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Figure 2 for Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Figure 3 for Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Figure 4 for Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Viaarxiv icon

Causal Diffusion Transformers for Generative Modeling

Add code
Dec 17, 2024
Viaarxiv icon

LLaVA-Critic: Learning to Evaluate Multimodal Models

Add code
Oct 03, 2024
Viaarxiv icon

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Add code
Jun 01, 2023
Figure 1 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 2 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 3 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 4 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Viaarxiv icon